Improving Spam Detection Based on Structural Similarity

نویسندگان

  • Luíz Henrique Gomes
  • Fernando D. O. Castro
  • Virgílio A. F. Almeida
  • Jussara M. Almeida
  • Rodrigo B. Almeida
  • Luís M. A. Bettencourt
چکیده

We propose a new spam detection algorithm that uses structural relationships between senders and recipients of email as the basis for spam detection. A unifying representation of users and receivers in the vectorial space of their contacts is constructed, that leads to a natural definition of similarity between them. This similarity is then used to group email senders and recipients into clusters. Historical information about the messages sent and received by the clusters is obtained by forwarding messages to an auxiliary spam detection algorithm and this information is used to reclassify messages. In the framework proposed, our algorithm aims at correcting misclassifications from an auxiliary algorithm. A simulation is performed based on actual data collected from an SMTP server from a large University. We show that our approach is able reduce false positives, produced by the auxiliary classification algorithm, up to about 60%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A Unified Model for Unsupervised Opinion Spamming Detection Incorporating Text Generality

Many existing methods on review spam detection considering text content merely utilize simple text features such as content similarity. We explore a novel idea of exploiting text generality for improving spam detection. Besides, apart from the task of review spam detection, although there have also been some works on identifying the review spammers (users) and the manipulated offerings (items),...

متن کامل

BotRevealer: Behavioral Detection of Botnets based on Botnet Life-cycle

Nowadays, botnets are considered as essential tools for planning serious cyberattacks. Botnets are used to perform various malicious activities such as DDoSattacks and sending spam emails. Different approaches are presented to detectbotnets; however most of them may be ineffective when there are only a fewinfected hosts in monitored network, as they rely on similarity in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0504012  شماره 

صفحات  -

تاریخ انتشار 2005